feat(integrations): add support for the `litellm` `responses`/`aresponses` APIs by constantinius · Pull Request #6205 · getsentry/sentry-python

constantinius · 2026-05-05T12:20:27Z

Description

Adds support for responses and aresponses, and their differences in output tracking. Also checking the conversation ID if it is passed in the extra_args.

Contributes to https://linear.app/getsentry/issue/TET-2287/see-if-we-can-auto-extract-conversationid-from-openai-python

linear-code · 2026-05-05T12:20:31Z

TET-2287 See if we can auto-extract conversationId from openai Python

github-actions · 2026-05-05T12:22:18Z

Codecov Results 📊

✅ 2187 passed | ⏭️ 154 skipped | Total: 2341 | Pass Rate: 93.42% | Execution Time: 4m 55s

All tests are passing successfully.

❌ Patch coverage is 0.00%. Project has 12726 uncovered lines.

Files with missing lines (2)

File	Patch %	Lines
`openai.py`	4.13%	⚠️ 673 Missing
`litellm.py`	0.00%	⚠️ 199 Missing

Generated by Codecov Action

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 9f4b78a. Configure here.}

…nses` APIs

alexander-alderman-webb · 2026-05-07T11:18:39Z

+        _input_callback(kwargs)
+        _success_callback(
+            kwargs, MockResponsesResponse(), datetime.now(), datetime.now()
+        )


With SDK tests we aim to verify that we generate some telemetry based on the user's interaction with the library. We want to assert the presence of telemetry if the patched library is used as the user would use the library.

Currently, we assert that telemetry is generated if _input_callback and _success_callback are each invoked exactly once.

This is not always the case, and the assumption has resulted in unhandled SDK exceptions that were fixed in the commit below:

96ebbf6

alexander-alderman-webb · 2026-05-07T11:24:50Z

+class MockResponsesUsage:
+    def __init__(self, input_tokens=12, output_tokens=24, total_tokens=36):
+        self.input_tokens = input_tokens
+        self.output_tokens = output_tokens
+        self.total_tokens = total_tokens
+
+
+class MockResponsesContentItem:
+    def __init__(self, text):
+        self.type = "output_text"
+        self.text = text
+
+
+class MockResponsesOutputMessage:
+    def __init__(self, text):
+        self.type = "message"
+        self.role = "assistant"
+        self.content = [MockResponsesContentItem(text)]
+
+
+class MockResponsesResponse:
+    def __init__(
+        self,
+        model="gpt-4.1-nano",
+        output=None,
+        usage=None,
+    ):
+        self.id = "resp-test"
+        self.model = model
+        self.output = output or [MockResponsesOutputMessage("the model response")]
+        self.usage = usage or MockResponsesUsage()


Related to https://github.com/getsentry/sentry-python/pull/6205/changes#r3201008608, we should aim to avoid custom types in our test suites.

As soon as we introduce custom types our tests are not coupled to the concrete types used in a library, and the tests no longer verify the SDK contract (namely, that telemetry is generated when a library is used like a user would interact with the library).

We can't hit real LLM APIs in the tests but we can do the next best thing: couple the sample response to the types in the library and patch at the lowest possible level.

This is done most of the tests in this test file, and there are helpers in the repo to accomplish writing effective tests (such as get_model_response()).

alexander-alderman-webb · 2026-05-07T11:32:45Z

+        if hasattr(response, "usage"):
+            usage = response.usage
            record_token_usage(
                span,
-                input_tokens=getattr(usage, "prompt_tokens", None),
-                output_tokens=getattr(usage, "completion_tokens", None),
-                total_tokens=getattr(usage, "total_tokens", None),
+                input_tokens=_read_usage_field(usage, "prompt_tokens", "input_tokens"),
+                output_tokens=_read_usage_field(
+                    usage, "completion_tokens", "output_tokens"
+                ),
+                total_tokens=_read_usage_field(usage, "total_tokens"),
            )


We already probe above to determine which API is used.

As a result, reading prompt_tokens or input_tokens is dead code conditioned on knowing which API you are handling (adding cognitive overhead when reading).

alexander-alderman-webb · 2026-05-07T11:39:44Z

                    set_data_normalized(
                        span, SPANDATA.GEN_AI_RESPONSE_TEXT, response_messages
                    )
+            elif hasattr(response, "output"):


You are adding code here which runs for all possible types of object that have an output field.

As a result the branch can easily be accidentally triggered as litellm evolves. There are multiple approaches to narrow down if you have a response in the Chat Completion API schema or a response in the Responses API schema. For example, you can check

isinstance(response, (ResponsesAPIResponse, BaseResponsesAPIStreamingIterator))

based on the signature of the library function

https://github.com/BerriAI/litellm/blob/a67b7a7e87f11bed01f9e073125a7f8f180105a2/litellm/responses/main.py#L449.

alexander-alderman-webb · 2026-05-07T11:43:42Z

+                normalized = normalize_message_roles(input_messages)  # type: ignore[arg-type]
+                messages_data = truncate_and_annotate_messages(normalized, span, scope)
+                if messages_data is not None:
+                    set_data_normalized(


Based on the marshaling above you know that messages_data is a list. You should just use span.set_data() when you know the type of an attribute (again, removing cognitive overhead by avoiding dead code).

alexander-alderman-webb · 2026-05-07T11:47:47Z

+    The usage object can be either a typed Pydantic model (attribute access) or
+    a plain dict (litellm hands us a dict for the assembled async-streaming
+    response), so we try both shapes.


Why don't we just read from the dictionary int he asynchronous streaming scenario and otherwise access the attribute on the Pydantic model 😄 ?

These responses have types, so an isinstance check can tell you which branch you are in.

In the end we're developing against a library with a finite number of return types, and we should just check which case we are handling instead of probing around. Probing around is less robust, since new return types accidentally trigger hasattr() checks.

alexander-alderman-webb · 2026-05-07T11:52:30Z

+                        for content_item in getattr(output, "content", []) or []:
+                            text = getattr(content_item, "text", None)
+                            if text is not None:
+                                output_text.append(text)


This has reached a lot of indentation for Python code. Usually you can keep code readable by adding early returns or breaking up into functions where appropriate.

constantinius requested a review from a team as a code owner May 5, 2026 12:20

cursor Bot reviewed May 5, 2026

View reviewed changes

Comment thread sentry_sdk/integrations/litellm.py Outdated

feat(integrations): add support for the litellm responses/`arespo…

bb31cad

…nses` APIs

sentry Bot reviewed May 5, 2026

View reviewed changes

Comment thread sentry_sdk/integrations/litellm.py Outdated

constantinius force-pushed the constantinius/feat/integrations/litellm-responses-conversation-id branch from 9f4b78a to bb31cad Compare May 5, 2026 12:30

sentry Bot reviewed May 5, 2026

View reviewed changes

Comment thread sentry_sdk/integrations/litellm.py

fix: different handlings for some token usages

d39b073

alexander-alderman-webb requested changes May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(integrations): add support for the `litellm` `responses`/`aresponses` APIs#6205

feat(integrations): add support for the `litellm` `responses`/`aresponses` APIs#6205
constantinius wants to merge 2 commits intomasterfrom
constantinius/feat/integrations/litellm-responses-conversation-id

constantinius commented May 5, 2026

Uh oh!

linear-code Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexander-alderman-webb May 7, 2026

Uh oh!

alexander-alderman-webb May 7, 2026

Uh oh!

alexander-alderman-webb May 7, 2026

Uh oh!

alexander-alderman-webb May 7, 2026

Uh oh!

alexander-alderman-webb May 7, 2026

Uh oh!

alexander-alderman-webb May 7, 2026

Uh oh!

alexander-alderman-webb May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

constantinius commented May 5, 2026

Description

Uh oh!

linear-code Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Results 📊

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexander-alderman-webb May 7, 2026

Choose a reason for hiding this comment

Uh oh!

alexander-alderman-webb May 7, 2026

Choose a reason for hiding this comment

Uh oh!

alexander-alderman-webb May 7, 2026

Choose a reason for hiding this comment

Uh oh!

alexander-alderman-webb May 7, 2026

Choose a reason for hiding this comment

Uh oh!

alexander-alderman-webb May 7, 2026

Choose a reason for hiding this comment

Uh oh!

alexander-alderman-webb May 7, 2026

Choose a reason for hiding this comment

Uh oh!

alexander-alderman-webb May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 5, 2026 •

edited

Loading